{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Homework 05 - Naive Bayes\n",
    "Isac Lira, 371890"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Questão 1\n",
    "\n",
    "Implemente um classifacor Naive Bayes para o problema de predizer a qualidade de um carro. Para este fim, utilizaremos um conjunto de dados referente a qualidade de carros, disponível no [UCI](https://archive.ics.uci.edu/ml/datasets/car+evaluation). Este dataset de carros possui as seguintes features e classe:\n",
    "\n",
    "** Attributos **\n",
    "1. buying: vhigh, high, med, low\n",
    "2. maint: vhigh, high, med, low\n",
    "3. doors: 2, 3, 4, 5, more\n",
    "4. persons: 2, 4, more\n",
    "5. lug_boot: small, med, big\n",
    "6. safety: low, med, high\n",
    "\n",
    "** Classes **\n",
    "1. unacc, acc, good, vgood\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 394,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Importa as bibliotecas \n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>buying</th>\n",
       "      <th>maint</th>\n",
       "      <th>doors</th>\n",
       "      <th>persons</th>\n",
       "      <th>lug_book</th>\n",
       "      <th>safety</th>\n",
       "      <th>class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>vhigh</td>\n",
       "      <td>vhigh</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>small</td>\n",
       "      <td>low</td>\n",
       "      <td>unacc</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>vhigh</td>\n",
       "      <td>vhigh</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>small</td>\n",
       "      <td>med</td>\n",
       "      <td>unacc</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>vhigh</td>\n",
       "      <td>vhigh</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>small</td>\n",
       "      <td>high</td>\n",
       "      <td>unacc</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>vhigh</td>\n",
       "      <td>vhigh</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>med</td>\n",
       "      <td>low</td>\n",
       "      <td>unacc</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>vhigh</td>\n",
       "      <td>vhigh</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>med</td>\n",
       "      <td>med</td>\n",
       "      <td>unacc</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  buying  maint doors persons lug_book safety  class\n",
       "0  vhigh  vhigh     2       2    small    low  unacc\n",
       "1  vhigh  vhigh     2       2    small    med  unacc\n",
       "2  vhigh  vhigh     2       2    small   high  unacc\n",
       "3  vhigh  vhigh     2       2      med    low  unacc\n",
       "4  vhigh  vhigh     2       2      med    med  unacc"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Carrega os dados\n",
    "\n",
    "cols = ['buying','maint','doors','persons','lug_book','safety','class']\n",
    "carset = pd.read_csv('carData.csv',names=cols)\n",
    "carset.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 1728 entries, 0 to 1727\n",
      "Data columns (total 7 columns):\n",
      "buying      1728 non-null object\n",
      "maint       1728 non-null object\n",
      "doors       1728 non-null object\n",
      "persons     1728 non-null object\n",
      "lug_book    1728 non-null object\n",
      "safety      1728 non-null object\n",
      "class       1728 non-null object\n",
      "dtypes: object(7)\n",
      "memory usage: 94.6+ KB\n"
     ]
    }
   ],
   "source": [
    "carset.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sem valores faltantes. Amém!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 388,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Implementa o classificador naive bayes\n",
    "\n",
    "class naive_classifier(object):\n",
    "    def _init_(self):\n",
    "        self.priors = {}\n",
    "        self.lh_probs = {}\n",
    "        self.train = None\n",
    "          \n",
    "    def calc_prior_probs(self):\n",
    "        train = self.train\n",
    "        classes = train['class'].unique()\n",
    "        priors = {}\n",
    "        for class_ in classes:            \n",
    "            priors[class_] = len(train[train['class'] == class_])/len(train)            \n",
    "        self.priors = priors\n",
    "        \n",
    "    def calc_likelihood_probs(self):\n",
    "        train = self.train\n",
    "        columns = carset.drop(['class'],axis=1).columns\n",
    "        classes = carset['class'].unique() \n",
    "        lh_probs = {}\n",
    "        for column in columns:\n",
    "            for class_ in classes:\n",
    "                for cat in carset[column].unique():                    \n",
    "                    cat_prior_prob = sum(train[column]==cat)/len(train)\n",
    "                    conditional_prob = sum((train['class']==class_) & (train[column]==cat))/sum(train['class']==class_)                                        \n",
    "                    \n",
    "                    if not conditional_prob: conditional_prob = 0.001   \n",
    "                    \n",
    "                    lh_probs[column,cat,class_] = conditional_prob/cat_prior_prob      \n",
    "        self.lh_probs = lh_probs\n",
    "    \n",
    "    def fit(self,train):\n",
    "        self.train = train\n",
    "        self.calc_prior_probs()\n",
    "        self.calc_likelihood_probs()\n",
    "    \n",
    "    def predict(self,xtest):\n",
    "        columns = self.train.drop(['class'],axis=1).columns\n",
    "        classes = self.train['class'].unique()  \n",
    "        predictions = []\n",
    "        \n",
    "        for i in xtest.index:\n",
    "            posterior_prob = {}\n",
    "            x = xtest.loc[i]\n",
    "            for class_ in classes:\n",
    "                posterior_prob[class_] = 1\n",
    "                for column in columns: \n",
    "                    #cat_prior = sum(self.train[column]==x[column])\n",
    "                    posterior_prob[class_] *= self.lh_probs[column,x[column],class_]                        \n",
    "                posterior_prob[class_] *= self.priors[class_]             \n",
    "            predictions.append(max(posterior_prob,key=posterior_prob.get))    \n",
    "        return predictions     "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 389,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Instancia um classificador e realiza predições para os dados de teste\n",
    "\n",
    "msk = np.random.rand(len(carset))<=0.8\n",
    "xtrain = carset[msk]\n",
    "xtest = carset[~msk]\n",
    "\n",
    "ytrain = xtrain['class']\n",
    "ytest = xtest['class']\n",
    "\n",
    "naive = naive_classifier()\n",
    "naive.fit(xtrain)\n",
    "pred = naive.predict(xtest)\n",
    "acc_my_nb = np.mean(pred==ytest)*100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 390,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.metrics import classification_report\n",
    "my_nv_report = classification_report(ytest,pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Questão 2\n",
    "Crie uma versão de sua implementação usando as funções disponíveis na biblioteca SciKitLearn para o Naive Bayes ([veja aqui](http://scikit-learn.org/stable/modules/naive_bayes.html)) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 391,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>buying_high</th>\n",
       "      <th>buying_low</th>\n",
       "      <th>buying_med</th>\n",
       "      <th>buying_vhigh</th>\n",
       "      <th>maint_high</th>\n",
       "      <th>maint_low</th>\n",
       "      <th>maint_med</th>\n",
       "      <th>maint_vhigh</th>\n",
       "      <th>doors_2</th>\n",
       "      <th>doors_3</th>\n",
       "      <th>...</th>\n",
       "      <th>doors_5more</th>\n",
       "      <th>persons_2</th>\n",
       "      <th>persons_4</th>\n",
       "      <th>persons_more</th>\n",
       "      <th>lug_book_big</th>\n",
       "      <th>lug_book_med</th>\n",
       "      <th>lug_book_small</th>\n",
       "      <th>safety_high</th>\n",
       "      <th>safety_low</th>\n",
       "      <th>safety_med</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   buying_high  buying_low  buying_med  buying_vhigh  maint_high  maint_low  \\\n",
       "1            0           0           0             1           0          0   \n",
       "2            0           0           0             1           0          0   \n",
       "3            0           0           0             1           0          0   \n",
       "4            0           0           0             1           0          0   \n",
       "5            0           0           0             1           0          0   \n",
       "\n",
       "   maint_med  maint_vhigh  doors_2  doors_3     ...      doors_5more  \\\n",
       "1          0            1        1        0     ...                0   \n",
       "2          0            1        1        0     ...                0   \n",
       "3          0            1        1        0     ...                0   \n",
       "4          0            1        1        0     ...                0   \n",
       "5          0            1        1        0     ...                0   \n",
       "\n",
       "   persons_2  persons_4  persons_more  lug_book_big  lug_book_med  \\\n",
       "1          1          0             0             0             0   \n",
       "2          1          0             0             0             0   \n",
       "3          1          0             0             0             1   \n",
       "4          1          0             0             0             1   \n",
       "5          1          0             0             0             1   \n",
       "\n",
       "   lug_book_small  safety_high  safety_low  safety_med  \n",
       "1               1            0           0           1  \n",
       "2               1            1           0           0  \n",
       "3               0            0           1           0  \n",
       "4               0            0           0           1  \n",
       "5               0            1           0           0  \n",
       "\n",
       "[5 rows x 21 columns]"
      ]
     },
     "execution_count": 391,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Transform as features categóricas em numéricas\n",
    "\n",
    "from sklearn.naive_bayes import GaussianNB\n",
    "from sklearn.preprocessing import LabelEncoder\n",
    "\n",
    "le = LabelEncoder()\n",
    "le.fit(xtrain['class'])\n",
    "ytrain = le.transform(xtrain['class'])\n",
    "ytest = le.transform(xtest['class'])\n",
    "\n",
    "xtrain =pd.get_dummies(xtrain.drop(['class'],axis=1))\n",
    "xtest = pd.get_dummies(xtest.drop(['class'],axis=1))\n",
    "\n",
    "xtrain.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 392,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Aplica o classifcador Naive Bayes (sklearn)\n",
    "\n",
    "nb = GaussianNB()\n",
    "nb.fit(xtrain,ytrain)\n",
    "pred = nb.predict(xtest)\n",
    "acc_sk_nb = np.mean(pred == ytest)*100\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Questão 3\n",
    "\n",
    "Analise a acurácia dos dois algoritmos e discuta a sua solução."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 393,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Acurária do classificador(from scratch): 84.2239185751\n",
      "Acurária do classificador(sklearn version): 80.1526717557\n"
     ]
    }
   ],
   "source": [
    "print('Acurária do classificador(from scratch):',acc_my_nb)\n",
    "print('Acurária do classificador(sklearn version):',acc_sk_nb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A acurácia obtida pelos classifcadores foram bem próximas. Como na versão do sklearn as features precisam ser numéricas, a tranformação categórica-numérica pode afetar o desempenho."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
